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Abstract. The purpose of this work is to survey what is known about the linear independence 
00 ' °f spikes and sines. The paper provides new results for the case where the locations of the spikes 

and the frequencies of the sines are chosen at random. This problem is equivalent to studying the 
1 spectral norm of a random submatrix drawn from the discrete Fourier transform matrix. The proof 

depends on an extrapolation argument of Bourgain and Tzafriri. 

< 

|>. ; 1. Introduction 



An investigation central to sparse approximation is whether a given collection of impulses and 
complex exponentials is linearly independent. This inquiry appears in the early paper of Donoho 
^> ■ and Stark on uncertainty principles [DS89] . and it has been repeated and amplified in the work 

of subsequent authors. Indeed, researchers in sparse approximation have developed a much deeper 
' understanding of general dictionaries by probing the structure of the unassuming dictionary that 

contains only spikes and sines. 

The purpose of this work is to survey what is known about the linear independence of spikes 
and sines and to provide some new results on random subcollections chosen from this dictionary. 
^S) . The method is adapted from a paper of Bourgain-Tzafriri [BT91j . The advantage of this approach 

is that it avoids some of the complicated combinatorial arguments that are used in related works, 
e.g., [CT!T06] . The proof also applies to other types of dictionaries, although we do not pursue this 
| line of inquiry here. 

1.1. Spikes and Sines. Let us shift to formal discussion. We work in the inner-product space 
C n , and we use the symbol * for the conjugate transpose. Define the Hermitian inner product 
| (x, y) = y*x and the £2 vector norm = \(x, x)\ 1 . We also write ||-|| for the spectral norm, 

i.e., the operator norm for linear maps from (C n ,£2) to itself. 

We consider two orthonormal bases for C n . The standard basis {e.,- : j = 1, 2, . . . , n} is given by 

>< 




?J e-(t) = { ' / for t = 1,2, ...,n. 

We often refer to the elements of the standard basis as spikes or impulses. The Fourier basis 
{fj : j = 1, 2, . . . , n} is given by 

tj (t) = -L e 2 ^'/ n for t = 1, 2, . . . , n. 



We often refer to the elements of the Fourier basis as sines or complex exponentials. 

The discrete Fourier transform (DFT) is the n x n matrix F whose rows are ff, f|, . . . , f *. The 
matrix F is unitary. In particular, its spectral norm ||F|| = 1. Moreover, the entries of the DFT 
matrix are bounded in magnitude by n~ 1//2 . Let T and 0, be subsets of {1,2, . . . , n}. We write 
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FfiT for the restriction of F to the rows listed in $7 and the columns listed in T. Since Fqt is a 
submatrix of the DFT matrix, its spectral norm does not exceed one. 

We use the analysts' convention that upright letters represent universal constants. We reserve 
c for small constants and C for large constants. The value of a constant may change at each 
appearance. 

1.2. Linear Independence. Let T and £1 be subsets of {1,2,... ,n}. Consider the collection of 
spikes and sines listed in these sets: 

3C = (T, Q) = {ej :jeT}U {fj : j E Q}. 

Today, we will discuss methods for determining when is linearly independent. Since a linearly 
independent collection in C n contains at most n vectors, we obtain a simple necessary condition 
|T| + |f2| < n. Developing sufficient conditions, however, requires more sophistication. 

We approach the problem by studying the Gram matrix G = G(j2f ), whose entries are the inner 
products between pairs of elements from It is easy to check that the Gram matrix can be 
expressed as 



G 



I|n| F nr 



(Fqt)* I|T| 

where I m denotes an m x m identity matrix and |-| denotes the cardinality of a set. 

It is well known that the collection X is linearly independent if and only if its Gram matrix is 
nonsingular. The Gram matrix is nonsingular if and only if its eigenvalues are nonzero. A basic 
(and easily confirmed) fact of matrix analysis is that the extreme eigenvalues of G are 1 ± ||Fqt||- 
Therefore, the collection X is linearly independent if and only i/||Fnr|| < 1. 

One may also attempt to quantify the extent to which collection is linearly independent. 
To that end, define the condition number k of the Gram matrix, which is the ratio of its largest 
eigenvalue to its smallest eigenvalue: 

k(g) 1 + ii f ^h 
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If ||Ff2r|| is bounded away from one, then the condition number is constant. One may interpret 
this statement as evidence the collection is strongly linearly independent. The reason is that 
the condition number is the reciprocal of the relative spectral-norm distance between G and the 
nearest singular matrix |Dem97t p. 33]. As we have mentioned, G is singular if and only if X is 
linearly dependent. 

This article focuses on statements about linear independence, rather than conditioning. Never- 
theless, many results can be adapted to obtain precise information about the size of ||Fqt||. 



1.3. Summary of Results. The major result of this paper to show that a random collection of 
spikes and sines is extremely likely to be strongly linearly independent, provided that the total 
number of spikes and sines does not exceed a constant proportion of the ambient dimension. We 
also provide a result which shows that the norm of a properly scaled random submatrix of the DFT 
is at most constant with high probability. For a more detailed statement of these theorems, turn 
to Section [231 



1.4. Outline. The next section provides a survey of bounds on the norm of a submatrix of the 
DFT matrix. It concludes with detailed new results for the case where the submatrix is random. 
Section [3] contains a proof of the new results. Numerical experiments are presented in Section [U 
and Section [5] describes some additional research directions. Appendix [A] contains a proof of the 
key background result. 
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2. History and Results 

The strange, eventful history of our problem can be viewed as a sequence of bounds on norm of 
the matrix Fqt- Results in the literature can be divided into two classes: the case where the sets 
fi and T are fixed and the case where one of the sets is random. In this work, we investigate what 
happens when both sets are chosen randomly. 

2.1. Bounds for fixed sets. An early result, due to Donoho and Stark }DS89| . asserts that an 
arbitrary collection of spikes and sines is linearly independent, provided that the collection is not 
too big. 

Theorem 1 (Donoho-Stark). Suppose that \T\ |f2| < n. Then \\Fq,t\\ < 1- 

The original argument relies on the fact that F is a Vandermonde matrix. We present a short 
proof that is completely analytic. A similar argument using an inequality of Schur yields the more 
general result of Elad and Bruckstein [EB021 Thm. 1]. 

Proof. The entries of the \Q\ x \T\ matrix Fqt are uniformly bounded by n _1//2 . Since the Frobenius 
norm dominates the spectral norm, HFq^H < ||Fnr|| F < \T\ /n. Under the hypothesis of the 
theorem, this quantity does not exceed one. □ 

Theorem [T] has an elegant corollary that follows immediately from the basic inequality for geo- 
metric and arithmetic means. 

Corollary 2 (Donoho-Stark). Suppose that \T\ + \Q\ < 2y/n. Then \\Fq T \\ < 1. 

The contrapositive of Theorem [T] is usually interpreted as an discrete uncertainty principle: a 
vector and its discrete Fourier transform cannot simultaneously be sparse. To express this claim 
quantitatively, we define the £q "quasinorm" of a vector by 1 1 o: 1 1 = \{j : ay / 0}|. 

Corollary 3 (Donoho-Stark). Fix a vector x G C™. Consider the representations of x in the 
standard basis and the Fourier basis: 

^ctjGj and x = ■ 1 @jfj. 

Then \\a\\ Q \\(3\\ > n. 

The example of the Dirac comb shows that Theorem Q] and its corollaries are sharp. Suppose 
that n is a square, and let T = Q. = {\/n, 2-y/n, 3-y/n, . . . , n}. On account of the Poisson summation 
formula, 

Therefore, the set of vectors &(T,{1) is linearly dependent and |T| |f2| = n. 

The substance behind this example is that the abelian group Z/Z n contains nontrivial subgroups 
when n is composite. The presence of these subgroups leads to arithmetic cancelations for properly 
chosen T and Q. See |DS89] for additional discussion. 

One way to eradicate the cancelation phenomenon is to require that n be prime. In this case, 
the group Z/Z n has no nontrivial subgroup. As a result, much larger collections of spikes and sines 
are linearly independent. Compare the following result with Corollary [2j 

Theorem 4 (Tao [Tao05t Thm. 1.1]). Suppose that n is prime. If \T\ + |f2| < n, then \\Fqt\\ < 1- 

The proof of Theorem [4] is algebraic in nature, and it does not provide information about con- 
ditioning. Indeed, one expects that some submatrices have norms very near to one. 

When n is composite, subgroups of Z/Z n exist, but they have a very rigid structure. Conse- 
quently, one can also avoid cancelations by choosing T and f2 with care. In particular, one may 
consider the situation where T is clustered and Q is spread out. Donoho and Logan |DL92] study 
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this case using the analytic principle of the large sieve, a powerful technique from number theory 
that can be traced back to the 1930s. See the lecture notes [Jam06| for an engaging introduction 
and references. 

Here, we simply restate the (sharp) large sieve inequality |Jam06t LS1.1] in a manner that 
exposes its connection with our problem. The spread of a set is measured as the difference (modulo 
n) between the closest pair of indices. Formally, define 

spread(O) = min{|j — k mod n\ : j,k € ^ k} 

with the convention that the modulus returns values in the symmetric range {— |~n/2] +1, . . . , [n/2\ }. 
Observe that |f2| • spread(fi) < n. 

Theorem 5 (Large Sieve Inequality). Suppose that T is a block of adjacent indices: 

T = {m + 1, m + 2, . . . , m + \T\} for an integer m. (2-1) 
For each set ; we have 

„ ,,2 . \T\ + n/spread($7) — 1 
n 

In particular, when T has form (|2.ip . the bound \T\ +n/spread(J7) < ra+ 1 implies that ||Fnr|| < 1. 

Of course, we can reverse the roles of T and O in this theorem on account of duality. The same 
observation applies to other results where the two sets do not participate in the same way. 

The discussion above shows that there are cases where delicately constructed sets T and lead 
to linearly dependent collections of spikes and sines. Explicit conditions that rule out the bad 
examples are unknown, but nevertheless the bad examples turn out to be quite rare. To quantify 
this intuition, we must introduce probability. 

2.2. Bounds when one set is random. In their work |DS89l Sec. 7.3], Donoho and Stark discuss 
numerical experiments designed to study what happens when one of the sets of spikes or sines is 
drawn at random. They conjecture that the situation is vastly different from the case where the 
spikes and sines are chosen in an arbitrary fashion. Within the last few years, researchers have made 
substantial theoretical progress on this question. Indeed, we will see that the linearly dependent 
collections form a vanishing proportion of all collections, provided that the total number of spikes 
and sines is slightly smaller than the dimension n of the vector space. 

First, we describe a probability model for random sets. Fix a number m < n, and consider the 
class 5? m of index sets that have cardinality m: 

y m = {5:Sc{l,2,...,n} and \S\ = m}. 

We may construct a random set by drawing an element from 5? m uniformly at random. That is, 

P{0 = 5} = \y m \~ l for each S £ 

In the sequel, we substitute the symbol |f2| for the letter m, and we say "J7 is a random set with 
cardinality to describe this type of random variable. This phrase should cause no confusion, 
and it allows us to avoid extra notation for the cardinality. 

In the sparse approximation literature, the first rigorous result on random sets is due to Candes 
and Romberg. They study the case where one of the sets is arbitrary and the other set is chosen 
at random. Their proof draws heavily on their prior work with Tao [CRT06] . 

Theorem 6 (Candes -Romberg [CR06, Thm. 3.2]). Fix a number s > 1. Suppose that 

\T\ + |Q| < - (2.2) 

V(s + l)logra 

IfT is an arbitrary set with cardinality \T\ and Q is a random set with cardinality then 

P{||F Qr || 2 > 0.5} < C((s+ l)\ogn) l / 2 n- s . 
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The numerical constant c > 0.2791, provided that n > 512. 

One should interpret this theorem as follows. Fix a set T, and consider all sets f2 that satisfy (|2.2p . 
Of these, the proportion that are not strongly linearly independent is only about n~ s . One should 
be aware that the logarithmic factor in (|2.2[) is intrinsic when one of the sets is arbitrary. Indeed, 
one can construct examples related to the Dirac comb which show that the failure probability is 
constant unless the logarithmic factor is present. We omit the details. 

The proof of Theorem [6] ultimately involves a variation of the moment method for studying 
random matrices, which was initiated by Wigner. The key point of the argument is a bound on the 
expected trace of a high power of the random matrix \J n/ |0| • Fj^Fo/r ~ I|t|- The calculations 
involve delicate combinatorial techniques that depend heavily on the structure of the matrix F. 

This approach can also be used to establish that the smallest singular value of Fqt is bounded 
well away from zero [CRT06, Thm. 2.2]. This lower bound is essential in many applications, but 
we do not need it here. For extensions of these ideas, see also the work of Rauhut |Rau07j . 

Another result, similar to Theorem [61 suggests that the arbitrary set and the random set do not 
contribute equally to the spectral norm. We present one version, whose derivation is adapted from 
|Tro07l Thm. 10 et seq.]. 

Theorem 7. Fix a number s > 1. Suppose that 

\T\ logra + |0| < — • 
s 

IfT is an arbitrary set of cardinality \T\ and Q is a random set of cardinality then 

F {||F^ T || 2 > 0.5} < n~ s . 

The proof of this theorem uses Rudelson's selection lemma [Rud991 Sec. 2] in an essential way. 
This lemma in turn hinges on the noncommutative Khintchine inquality [LP861 IBuc01| . For a 
related application of this approach, see [CR07j . 

Theorems [6] and [7] are interesting, but they do not predict that a far more striking phenomenon 
occurs. A random collection of sines has the following property with high probability. To this 
collection, one can add an arbitrary set of spikes without sacrificing linear independence. 

Theorem 8. Fix a number s > 1, and assume n > N(s). Except with probability n~ s , a random 
set £1 whose cardinality |fi| < n/3 has the following property. For each set T whose cardinality 

en 



T < 



s log 5 n 



1 1 2 

it holds that F^y < 0.5. 



This result follows from the (deep) fact that a random row-submatrix of the DFT matrix satisfies 
the restricted isometry property (RIP) with high probability. More precisely, a random set Q with 
cardinality |f2| verifies the following condition, except with probability n~ s . 



1^1 ,,9 3101 , c 101 

< F QT < -J-i when T < 1 1 . (2.3) 
2n In s log n 



This result is adapted from [RV061 Thm. 2.2 et seq.]. 

The bound (|2,3p was originally established by Candes and Tao |CT06| for sets T whose cardinality 
\T\ < c |0| / s log 6 n. Rudelson and Vershynin developed a simpler proof and reduced the exponent 
on the logarithm |RV06| . Experts believe that the correct exponent is just one or two, but this 
conjecture is presently out of reach. 

Proof. Let c be the constant in (12. 3ft . Abbreviate m = c|fi| /slog 5 n, and assume that m > 1 for 
now. Draw a random set O with cardinality so relation (|2.3|) holds except with probability n~ s . 
Select an arbitrary set T whose cardinality \T\ < cn/Qs log 5 n. We may assume that 2\T\ /m > 1 
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because |0| < n/3. Partition T into at most 2\T\ /m disjoint blocks, each containing no more than 
m indices: T = T\ U T2 U • • • U T 2 m l m - Apply <|2.3|) to calculate that 

1,2 2ITI „ ll2 , , 2s log 5 n 3|fi| 1 

Wq.t\\ < max fc \\Fqt < T — , — < -. 

m c\U\ 2n 2 

Adjusting constants, we obtain the result when is not too smal. 

In case m < 1, draw a random set £1 and then draw additional random coordinates to form a 
larger set 0' for which c |fi'| /s log 5 n > 1 and |fi'| < n/3. This choice is possible because n > N(s). 
Apply the foregoing argument to fi'. Since the spectral norm of a submatrix is not larger than the 

1 1 2 1 1 2 

norm of the entire matrix, we have the bound ||Fqt|| < ||Fn"r|| <■ 0.5 for each sufficiently small 
set T. □ 

2.3. Bounds when both sets are random. To move into the regime where the number of spikes 
and sines is proportional to the dimension n, we need to randomize both sets. The major goal of 
this article is to establish the following theorem. 

Theorem 9. Fix a number e > 0, and assume that n > N(e). Suppose that 

\T\ + |fi| < c(e) • n. 
Let T and £1 be random sets with cardinalities \T\ and |0|. Then 

IP{||FnT|| 2 > 0.5 J < exp{-n 1/2 ~ £ }. 

The constant c(e) > e _C//£ . 

Note that the probability bound here is superpolynomial, in contrast with the polynomial bounds 
of the previous section. The estimate is essentially optimal. Take e > 0, and suppose it were possible 
to obtain a bound of the form 

P{||Fq T || = 1} < exp{-n 1/2+e } where \T\ + |fi| < 2n 1/2 . 

According to Stirling's approximation, there are about expjn 1 / 2 log n} ways to select two sets sat- 
isfying the cardinality bound. At the same time, the proportion of sets that are linearly dependent 
is at most exp{— n 1 / 2+e }. Multiplying these two quantities, we find that no pair of sets meeting 
the cardinality bound is linearly dependent. This claim contradicts the fact that the Dirac comb 
yields a linearly dependent collection of size 2n 1//2 . 

Remark 10. As we will see, Theorem holds for every n x n matrix A with constant spectral 
norm and uniformly bounded entries: 

\\A\\ < 1 and \<iu>t\ < n" 1 / 2 for oj,t = 1, 2, . . . , n. 

The proof does not rely on any special properties of the discrete Fourier transform. 

2.4. Random matrix theory. Finally, we consider an application of this approach to random 
matrix theory. Note that each column of Fqt has £2 norm jn. Therefore, it is appropriate to 
rescale the matrix by \fnj |H| so that its columns have unit norm. Under this scaling, it is possible 
that the norm of the matrix explodes when |f2| is small in comparison with n. The content of the 
next result is that this event is highly unlikely if the submatrix is drawn at random. 

Theorem 11. Fix a number 5 G (0, c). Suppose that n > N(5) and that 

|T| < \Q\ = 5n. 

IfT and O are random sets with cardinalities \T\ and then 

^\\F nT \\>9[<n~C 



SPIKES AND SINES 



7 



For 5 in the range [c, 1], it is evident that 

Fnr|| < c _1 . 

Therefore, we obtain a constant bound for the norm of a normalized random submatrix throughout 
the entire parameter range. 

Remark 12. Theorem [77] also holds for the class of matrices described in Remark 1 lffl 

3. Norms of random submatrices 

In this section, we prove Theorem [9] and Theorem II li First, we describe some problem simplifi- 
cations. Then we provide a moment estimate for the norm of a very small random submatrix, and 
we present a device for extrapolating a moment estimate for the norm of a much larger random 
submatrix. This moment estimate is used to prove a tail bound, which quickly leads to the two 
major results of the paper. 

3.1. Reductions. Denote by P$ a random n x n diagonal matrix where exactly m = [Sn\ entries 
equal one and the rest equal zero. This matrix can be seen as a projector onto a random set of m 
coordinates. With this notation, the restriction of a matrix A to m random rows and m random 
columns can be expressed as P$APt, where the two projectors are statistically independent from 
each other. 

Lemma 13 (Square case). Let A be an n x n matrix. Suppose that T and Q are random sets with 
cardinalities \T\ and If S > max{|T| , |f2|}/n, then 

P{||Anr|| > u} <F{\\P s APl\\ > u} foru> 0. 

Proof. It suffices to show that the probability is weakly increasing as the cardinality of one set 
increases. Therefore, we focus on O and remove T from the notation for clarity. Let fibea random 
subset of cardinality |f2|. Conditional on Q, we may draw a uniformly random element u from Q c , 
and put f2' = Q U {uj}. This Q' is a uniformly random subset with cardinality \Q\ + 1. We have 

P{||A n || >«} = E/(||An|| >u) 

<MI(\\A QU{U>} \\ >u) 
= EI(||A n ,|| >u) 
= ¥{\\A n >\\>u} 

where we have written 1(E) for the indicator variable of an event. The inequality follows because 
the spectral norm is weakly increasing when we pass to a larger matrix, and so we have the inclusion 
of events {ST : ||Aq|| > u} C : || Aqu^I] > u}. □ 

It can be inconvenient to work with projectors of the form Pg because their entries are dependent. 
We would prefer a model where coordinates are selected independently. To that end, denote by Rg 
a random n x n diagonal matrix whose entries are independent 0-1 random variables of mean 5. 
This matrix can be seen as a projector onto a random set of coordinates with average cardinality 
6n. The following lemma establishes a relationship between the two types of coordinate projectors. 
The argument is drawn from [CR061 Sec. 3]. 

Lemma 14 (Random coordinate models). Fix a number 5 in [0, 1]. For every n X n matrix A, 

F{\\P S A\\ >u}< 2¥{\\R S A\\ > u} foru> 0. 

In particular, 

F{\\P S AP S '\\ >u}< 4P{ || R S AR' S || > u} foru> 0. 
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Proof. Given a coordinate projector R, denote by cr(R) the set of coordinates onto which it projects. 
For typographical felicity, we use #a(R) to indicate the cardinality of this set. 
First, suppose that 5n is an integer. For every u > 0, we may calculate that 

En 
. W{\\R s A\\>u\#a(R 5 )=j}-¥{#a(Rs)=j} 
j=on 

>P{ R S A >u #a (R 5 )=5n}-J2. . ¥{#a(R s )=j} 
>±F{\\P 5 A\\>u}. 

The second inequality holds because the spectral norm of a submatrix is smaller than the spectral 
norm of the matrix. The third inequality relies on the fact [JS681 Thm. 3.2] that the medians of 
the binomial distribution binomial(5, n) lie between 5n — 1 and 5n. 

In case 5n is not integral, the monotonicity of the spectral norm yields that 

P{||fl*A|| > u} >F{\\R l5nl/n A\\ > u} . 

Since P\s n \/ n = Pg, this point completes the argument. □ 

3.2. Small submatrices. We focus on matrices with uniformly bounded entries. The first step in 
the argument is an elementary estimate on the norm of a random submatrix with expected order 
one. In this regime, the bound on the matrix entries determines the norm of the submatrix; the 
signs of the entries do not play a role. The proof shows that most of the variation in the norm 
actually derives from the fluctuation in the order of the submatrix. 

Lemma 15 (Small Submatrices). Let A be annxn matrix whose entries are bounded in magnitude 
by n~ 1 / 2 . Abbreviate g = 1/n. When q > 21ogn > e, 



(E\\R s AR' s \\ q ) <2qn 



-1/2 



Proof. By homogeneity, we may rescale A so that its entries are bounded in magnitude by one. 
Define the event Sj^ where the random submatrix has order j x k. 

S jfe = {#a(R s ) = j and #a(R' Q ) = k}. 

On this event, the norm of the submatrix can be bounded as 

\\R e AR' g \\ < \\R e AR' e \\ F < Vjfc. 

Using elementary inequalities, we may estimate the probability that this event occurs. 

p(Sjfc) = G) GV' +fc(1 - g)2n ~ u+k) - (jj (?) Vo+fc) = wy ■ ^ ,k)k - 

With this information at hand, the rest of the proof follows from some easy calculations: 
E \\R e AR' e \\ 2q = Y^- E [l I ^^11 2<? I *W| • P (Sjfc) 



<j,k=i 

Ajk) q -(e/jy -(e/k) k 

* — ' j,k=l 



= [EL^-(e/*)f. 

A short exercise in differential calculus shows that the maximum term in the sum occurs when 
/clog k = q. Write k* for the solution to this equation, and note that k± < q. Bounding all the 
terms by the maximum, we find 

En , 
^ k q ■ (e/k) < n • exp{g log k* — k* log k* + k*} < n ■ exp{g log h*} < n ■ q q . 
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Combining the last two inequalities, we reach 

(E WR^rX) 1 ^ < (n 2 ■ q 2q ) 1/2q = n 1/<? ■ q. 

When q > 21ogn, the first term is less than two. □ 

Remark 16. This argument delivers a moment estimate that is roughly a factor oflogq smaller 
than the one stated. This fact can be used to sharpen the major results slightly at a cost we prefer 
to avoid. 

3.3. Extrapolation. The key technique in the proof is an extrapolation of the moments of the 
norm of a large random submatrix from the moments of a smaller random submatrix. Without 
additional information, extrapolation must be fruitless because the signs of matrix entries play a 
critical role in determining the spectral norm. It turns out that we can fold in information about 
the signs by incorporating a bound on the spectral norm of the matrix. The proof, which we provide 
in Appendix El ultimately depends on the minimax property of the Chebyshev polynomials. The 
method is essentially the same as the one Bourgain and Tzafriri develop to prove Proposition 2.7 
in [BTfllj . See also |Tro081 Sec. 7]. 

Proposition 17. Suppose that A is an n x n matrix with \\A\\ < 1. Let q be an integer that 
satisfies 131ogn < q < n/2. Write g = 1/n, and choose 5 in the range [1/n, 1]. For each A G (0, 1), 
it holds that 

,,0„\ 1/2(7 . fx/ ,,9„\ l/2o 

E llRsAR's \\ q ) < 85 A max l,n A IE \\R e AR' 9 



Although the statement is a little complicated, we require the full power of this estimate. As 
usual, the parameter q is the moment that we seek. The proposition extrapolates from a matrix 
of expected order 1 up to a matrix of expected order Sn. The parameter A is a tuning knob that 
controls how much of the estimate is determined by the spectral norm of the full matrix and how 
much is determined by the norm bound for small submatrices. Indeed, the first member of the 
maximum reflects the spectral norm bound ||A|| < 1. 

3.4. A tail bound. We are now prepared to develop a tail bound for the random norm ||i2jAJir|| . 
Lemma 18 (Tail Bound). Let A be an n x n matrix for which 

||A|| < 1 and \a>jk\ < n f or 3, k = 1, 2, . . . , n. 

Choose 5 from [1/n, 1] and an integer q that satisfies 13 log n < q < n/2. For each A G (0, 1), it 
holds that 

Y^R 5 AR' 5 \\ > 85 A max{l,2gn A ~ 1/2 } ■ uj < u~ 2q for u > 1. 

Proof of Lemma\18[ Choose an integer q in the range [13 log n, n/2]. Markov's inequality allows 
that 

\RxAR' x \\ > (E\\RsAR'A\ 2q ) 1/29 -uX <u~ 2q . 



Therefore, we may establish the result by obtaining a moment estimate. This estimate is a direct 
consequence of Lemma [TBI and Proposition 1171 

(E 1 1 R s AR' S \\ 2q ) 1/29 < 85 X max { 1 , n x • 2qn~ 1/2 } . 

Combine the two bounds to complete the argument. □ 

The two major results of this paper, Theorem [9] and Theorem [TT1 both follow from a simple 
corollary of Lemma [ 
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Corollary 19. Suppose that T and f2 are random sets with cardinalities \T\ and Assume 
5 > max{|T| , |fi|}/n. For each integer q that satisfies 13 log n < q < n/2 and for A G [0, 1], it holds 
that 

P{||Fnr|| > S^maxjl^^- 1 / 2 } • u} < ^u~ 2q for u> 1. 

Proof. Consider the matrix A = F. Perform the reductions from Section 13.11 Lemma [13] and 
Lemma [T3J Then apply the tail bound, Lemma [T8l □ 

3.5. Proof of Theorem [9] The content of Theorem [9] is to provide a bound on 5 which ensures 
that ||FnT|| is somewhat less than one with extremely high probability. To that end, we want to 
make A close to zero and q large. The following selections accomplish this goal: 

A = T^K ^d q=[0.5n^\. 
log(l/£) 

Note that we can make A as small as we like by taking 5 sufficiently small. For any value of A < 0.5, 
the number q satisfies the requirements of Corollary 1191 as soon as n is sufficiently large. 
Now, the bound of Corollary [19] results in 

lP{||Fnr|| > 0.5u} < Au~ 2q . 

For u = y/2, we see that 

IP{||FnT|| 2 > 0.5} < 4-2 -9 . 
If follows that, for any assignable e > 0, we can make 

IP{||FnT|| 2 > 0.5} < exp{-n 1/2 ~ £ } 
provided that 5 < e _C//£ = c(e) and that n > N(e). 

3.6. Proof of Theorem II 11 To establish Theorem II 1\ we must make the parameter A as close to 
0.5 as possible. Choose 

where C is a large constant. These choices are acceptable once 5 is sufficiently small and n is 
sufficiently large. 

Corollary [19] delivers 

P 

For u = 90/89, we reach 



1 QT 



> 8.95 L/Z u[ <4u" clogn . 



IFhtII > 95 i/z !- < n~ c 



9S 1 / 2 } 

adjusting constants as necessary. Finally, we transfer the factor 5 1 ^ 2 to the other side of the 
inequality and set 8 = |fi| /n to complete the proof. 

4. Numerical Experiments 

The theorems of this paper provide gross information about the norm of a random submatrix of 
the DFT. To complement these results, we performed some numerical experiments to give a more 
detailed empirical view. 

The first set of experiments concerns random square submatrices of a DFT matrix of size n, 
where we varied the parameter n over several orders of magnitude. Given a value of 5 G (0,0.5), 
we formed one hundred random submatrices with dimensions 6n x 5n and computed the average 
spectral norm of these matrices. We did not plot data when 5 G (0.5, 1) because the norm of a 
random submatrix equals one. 
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Norm of random square submatrix drawn from n x n DFT 




1 1 1 1 1 1 1 1 1 1 1 1 

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 
Proportion of rows/cols (6) 



Figure 1. Sample average of the norm of a random 5n x 5n submatrix drawn from 
the n x n DFT. 

Figure Q] shows the raw data for this first experiment. As n grows, one can see that the norm 
tends toward an apparent limit: 2yJ 5(1 — 5). In Figure El we re-scale each matrix by c) -1 / 2 so its 
columns have unit norm and then compute the average spectral norm. More elaborate behavior is 
visible in this plot: 

• For 5 = 1/n, the norm of a random submatrix is identically equal to one. 

• For S = 2/n, the norm tends toward 1 + 2 -1 / 2 = 1.7071 . . . , which can be verified by a 
relatively simple analytic computation. 

• The maximum value of the norm appears to occur at 5 = 2/y/n. 

• The apparent limit of the scaled norm is 2y/l — 5, in agreement with the first figure. 
These phenomena are intriguing, and it would be valuable to understand them in more detail. 
Unfortunately, the methods of this paper are not refined enough to provide an explanation. 

In the second set of experiments, we studied the norm of a random rectangular submatrix of the 
128 x 128 DFT matrix. We varied the proportion 5t of columns and the proportion 5q of rows 
in the range (0,1). For each pair (<5r,5n), we drew 100 random submatrices and computed the 
average norm. Figure [3] shows the raw data. The apparent trend is that 

E ||Pa n FPi T || = 2^5(1 - S) where 5 = El±H . 

Figure H] shows the same data, rescaled by max{|T| , |il|} _1//2 . As in the square case, this plot 
reveals a variety of interesting phenomena that are worth attention. 

5. Further Research Directions 

The present research suggests several directions for future exploration. 
(1) It may be possible to improve the constants in Proposition [T7] using a variation of the 
current approach. Instead of using the Chebyshev polynomial to estimate the coefficients 
of the polynomial that arises in the proof, one might use the nonnegative polynomial of least 
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Scaled norm of random square submatrix drawn from n x n DFT 

2^ 1 1 1 1 1 1 1 1 



1.9 - 




Proportion of rows/cols (6) 



Figure 2. Sample average of the norm of a random 5n x 5n submatrix drawn from 
the n x n DFT and re-scaled by 



Norm of random rectangular submatrix drawn from 128 x 128 DFT 




FIGURE 3. Sample average of the norm of a random Squ x 5xn submatrix drawn 
from the 128 x 128 DFT matrix. 
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Scaled norm of random rectangular submatrix drawn from 128 x 128 DFT 



1.8 -, 




FIGURE 4. Sample average of the norm of a random 5nn x submatrix drawn 
from the 128 x 128 DFT matrix and rescaled by max{|T| , |^|}~ 1/2 . 



deviation from zero on the interval [0, 1]. The paper |BK85] is relevant in this connection: 
its authors identify the nonnegative polynomials with least deviation from zero with respect 
to L p norms for p < oo. The p = oo case appears to be open, and uniqueness may be an 
issue. 

(2) Instead of reducing the problem to the square case, it would be valuable to understand 
the rectangular case directly. Again, it may be possible to adapt Proposition [T7] to handle 
this situation. This approach would probably require the bivariate polynomials of least 
deviation from zero identified by Sloss [Slo65j . 

(3) A harder problem is to determine the limiting behavior of the expected norm of a random 
submatrix as the dimension grows and the proportion of rows and columns remains fixed. 
We frame the following conjecture. 

Conjecture 20 (Quartercircle Law). A random square submatrix of the nxn DFT satisfies 

E||P 5 FP 5 '|| < 2^5(1-5). 
The inequality becomes an equality as n — > oo. 

One can develop a similar statement about random rectangular submatrices. At present, 
however, these conjectures are out of reach. 

(4) Finally, one might study the behavior of the lower singular value of a (suitably normalized) 
random submatrix drawn from the DFT. There are some results available when one set, 
say T, is fixed [CRT06] . It is possible that the behavior will be better when both sets 
are random. The present methods do not seem to provide much information about this 
problem. 
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Appendix A. Chebyshev Extrapolation 

One of the major tools in the proof of Theorem [9] is Proposition [TTl This result extrapolates the 
moments of the norm of a large random submatrix drawn from a fixed matrix, given information 
about a small random submatrix. An important idea behind the result is to fold information about 
the spectral norm of the matrix into the estimate. The extrapolation technique is due to Bourgain 
and Tzafriri |BT91j . We require a variant of their result, so we repeat the argument in its entirety. 
The complete statement of the result follows. 

Proposition 21. Suppose that A is an n x n matrix with \\A\\ < 1. Let q be an integer that satisfies 
13 log?i < q < n/2. Choose parameters g £ (0, 1) and 5 £ [g, 1]. For each A E [0, 1], it holds that 

(E 1 1 RsAR' s \\ 2q ) ^ < 85 x max j 1 , g' x (e \ \ R s AR' g \ | 2 ") ^ X . 

The same result holds if we replace R' s by R$ and replace R' Q by R Q . 

V. A. Markov observed that the coefficients of an arbitrary polynomial can be bounded in 
terms of the coefficients of a Chebyshev polynomial because Chebyshev polynomials are the unique 
polynomials of least deviation from zero on the unit interval. See |Tim63t Sec. 2.9] for more details. 

Proposition 22 (Markov). Let p{t) = Ylk=o Ck ^ k - The coefficients of the polynomial p satisfy the 
inequality 

\cu\ < — max \p(t)\ < e r max \p(t)\ . 
k\ |t|<i ]Fy n ~ \t\<i 1 v n 

for each k = 0, 1, . . . , r. 

With Markov's result at hand, we can prove Proposition 1211 

Proof of Proposition\21\ We establish the result when the two diagonal projectors are independent; 
the other case is almost identical because this independence is never exploited. Define the function 

F{s) = E 1 1 R S AR' S 1 1 2q for s £ [0, 1]. 

Note that F(s) < 1 because ||i? s Ai?g|| < ||A|| < 1. Furthermore, F does not decrease. 

The function F is comparable with a polynomial. Use the facts that 2q is even and that A has 
dimension n to check the inequalities 

F(s) < Etr&ce[(R s AR' s )*(R s AR' s )] q < nF(s). (A.l) 

Define a second function 

p(s) = Etr&ce{{R s AR' s )*{R s AR' s )] q = Etrace(A* R s AR' s ) q , 

where we used the cyclicity of the trace and the fact that R s and R' s are diagonal matrices with 
0-1 entries. Expand the product and compute the expectation using the additional fact that the 
entries of the diagonal matrices are independent random variables of mean s. We discover that p 
is a polynomial of maximum degree 2q in the variable s: 

^)=Er=i cfcsfc 

The polynomial has no constant term because Rq = 0. 
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We can use Markov's technique to bound the coefficients of the polynomial. First, make the 
change of variables s = gt 2 to see that 



\p(gt 2 )\ < nF(gt 2 ) <nF(g) for \t\ < 1. 

The first inequality follows from (|A.1|) and the second follows from the monotonicity of F. The 
polynomial p(gt 2 ) has degree 4g in the variable t, so Proposition 1221 yields 

\c k \ g k < ne 4q F(g) for k = 1,2, ... ,2g. (A.2) 

Evaluate this expression at £> = 1 and recall that F < 1 to obtain a second bound, 

N<ne 4<? for k = 1,2,... ,2g. (A.3) 

To complete the proof, we evaluate the polynomial at a point S in the range [g, 1]. Fix a value 
of A in [0, 1], and set K = \_2\q\ . In view of (|A.2|) and (|A.3|) . we obtain 

< ne 4 " [K{8/q) k F{q) + (2g - K)<^ +1 ] 

< ne 4q 5 2Xq \Kg~ 2Xq F(g) + (2g - K) 

< ne* q 5 2Xq ■ 2gmax{l, ^" 2A9 F(^)} 

The third and fourth inequalities use the conditions 5/g > 1 and 5 < 1, and the last bound is an 
application of Jensen's inequality. Taking the (2g)th root, we reach 

F{5) 1/2q < {2qn) 1 / 2q e 2 5 x max{l, g~ X F{g) l ' 2q }. 
The leading constant is less than 8, provided that 13 log n < q < n/2. □ 
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